An Information Extraction Method for Multiple Data Sources
نویسندگان
چکیده
We developed a method of information extraction for multiple data sources or for various kinds of datasets like Internet web pages. Generally, because many different writing styles or vocabularies exist among different kinds of data, the accuracy of information extraction using various kinds of datasets is not better than that using a single kind of data. Our method divides the data by clustering and learns extraction rules to increase accuracy even if we use various kinds of datasets. In our experiment, we applied our method to a NTCIR8 Technical Trend Map Creation subtask that uses two kinds of data, patents and technical papers, and obtained the better precision than normal information extraction method.
منابع مشابه
Data and Methods for the Production of National Population Estimates: An Overview and Analysis of Available Metadata
Thomas Spoorenberg Translated by: Elham Fathi Statistical Center of Iran Abstract. Official population estimates can be produced using a variety of data sources and methods. These range from the direct extraction of information from continuously updated population registers to procedures for updating the status of a population enumerated previously in a periodic census. Additional sources and ...
متن کاملJoint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis
Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical) seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior informat...
متن کاملAn Approach for the Extraction of Information from Heterogeneous Sources of Textual Data
Extracting informations from multiple sources of textual data and integrating them in order to provide information is a challenging research topic in the database area. This paper presents a Description Logics approach to provide solutions both for data integration and data querying. The approach includes: a common description of sources, compliant with a subset of ODMG93; Description Logics te...
متن کاملA Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding
Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...
متن کاملبهبود طیفسنجی گاما در پایش هوایی پرتویی با استفاده از الگوی میانگین متحرک خود همبسته یکپارچه
The precise and timely manner modeling of received photon counts from gamma-ray sources has an important role in providing afore information for Airborne Gamma Ray Spectrometry (AGRS). In this manuscript, the Auto-Regressive Integrated Moving Average (ARIMA) model has been used to model AGRS. The proposed method provides gamma source and environmental disturbances ARIMA model, using known radio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010